Trends in Hearing — Latest Matching Preprints

1

When feedback backfires: investigating neurofeedback effects in a closed-loop auditory attention decoding paradigm

Rotaru, I.; Geirnaert, S.; Heintz, N.; Bertrand, A.; Francart, T.

2026-04-30 neuroscience 10.64898/2026.04.28.721343 medRxiv

Top 0.1%

18.5%

Show abstract

Selective auditory attention decoding (AAD) enables tracking which of multiple concurrent speakers a listener attends to and is a key building block for neuro-steered hearing devices. While AAD integrated in a closed-loop system with real-time neurofeedback (NFB) is hypothesized to improve decoding through neural adaptation and error-correction behaviour, the short-term behavioral and algorithmic impact of such a bilateral human-machine interaction remains poorly understood. Here we evaluated the effects of NFB on AAD accuracy and user experience in a single-session AAD paradigm with online NFB involving nineteen participants. They performed a selective listening task with enforced attention switches across four conditions: open-loop (OL), closed-loop with auditory gain feedback (CLA), closed-loop with visual feedback (CLV), and a condition with pseudo-auditory gain control (psCLA) decoupled from the participants individual neural activity. AAD was performed online using both subject-specific and subject-independent linear decoders on 5 s sliding windows, followed by Hidden Markov Model post-processing. Online analysis showed comparable decoding performance across all conditions. However, offline posthoc analysis using subject-independent decoders revealed that AAD accuracy in the CLA condition was significantly lower than in the OL baseline. Subjectively, participants reported that CLA was significantly more distracting and required higher switching effort. Crucially, a causal analysis of the psCLA condition found no robust evidence that higher audio gains inherently improve decoding accuracy. Our results demonstrate that within a single-session paradigm with rapidly varying feedback cues, auditory neurofeedback may degrade AAD performance by increasing cognitive load and distraction. These findings suggest that suboptimal feedback can impede rather than facilitate learning. We conclude that more accurate and stable decoders and longitudinal, multi-session training protocols are likely essential prerequisites for achieving beneficial neurofeedback effects in closed-loop auditory attention systems.

2

Development and clinical application of a consonant confusion task to evaluate hearing aid benefit

Hajicek, J.; Harris, S. E.; Neely, S. T.

2026-04-24 otolaryngology 10.64898/2026.04.23.26351598 medRxiv

Top 0.1%

10.2%

Show abstract

PurposeThis research sought to develop a low-cognitive-load speech-in-noise test based on consonant confusions with the potential for assessing hearing-aid benefit. MethodsVowel-consonant-vowel (VCV) stimuli with added speech-shaped noise were presented as a closed-set consonant identification task. Initially, consonant-confusion matrices were used to select, from a larger set of consonants and vowel contexts, a set of ten consonants and associated signal-to-noise ratios (SNR) that were sensitive to hearing loss. The sensitivity of the qVCV test to hearing loss was validated by comparing predicted pure-tone average (PTA) hearing thresholds with their audiometric PTA. Clinical viability of the qVCV test was assessed by comparisons to the QuickSIN test. Hearing-aid benefit was assessed by comparing test scores in unaided and aided conditions. ResultsThe consonants most sensitive to hearing loss were /b d g t k v z s [esh] n/ in the vowel context /[a]/. A cross-validated prediction of PTA had a mean-absolute error of 5.7 dB. The repeatability of qVCV at 50 trials was equivalent to the QuickSIN average of two lists. Hearing-aid benefit was quantified as a decibel reduction in hearing loss. ConclusionsqVCV and QuickSIN performed similarly when test times are equated. The advantages of qVCV include lower cognitive demand, fewer learning effects, and automated scoring. PTA predicted by qVCV which greatly exceeds audiometric PTA may indicate either cognitive deficits or cochlear neural degeneration. The qVCV quantification of hearing-aid benefit may have clinical value.

3

Auditory Working Memory Mediates the Relationship between Musical Sophistication and Speech-in-noise Perception

Colak, H.; Benzaquen, E.; Guo, X.; Lad, M.; Sedley, W.; Griffiths, T. D.

2026-05-13 neuroscience 10.64898/2026.05.13.724783 medRxiv

Top 0.1%

8.2%

Show abstract

Understanding speech in noisy environments (SPIN) is an important everyday ability, and engaging in musical activities has been proposed as a factor that may support this ability. However, the cognitive mechanisms underlying a potential musical advantage in SPIN perception remain unclear. Here we investigated whether musical sophistication is associated with better SPIN perception in a large population-based sample, and whether this relationship is mediated by auditory working memory (AWM), verbal working memory (VWM), or non-verbal intelligence. We recruited 203 participants and measured SPIN perception at both word and sentence levels. Musical sophistication was assessed using the Goldsmiths Musical Sophistication Index (Gold-MSI). AWM was measured using delayed matching of tone frequency or the modulation rate of amplitude modulated white noise, VWM was based on backward digit span task, and non-verbal intelligence used matrix reasoning. Mediation analyses revealed that AWM fully mediated the relationship between musical sophistication and SPIN perception, whereas VWM showed no mediation effect. Non-verbal intelligence showed a partial mediating effect. Additional control analyses using structural equation modelling revealed that the indirect effect through AWM remained significant after accounting for age, hearing thresholds, and non-verbal intelligence. Together, these findings suggest that individuals with greater musical sophistication demonstrate better daily life listening abilities, and that superior auditory working memory may be the key cognitive mechanism underlying this advantage.

4

Testing differential effects of periodicity and predictability in auditory rhythmic cueing of concurrent speech

MacLean, J.; Zhou, M.; Bidelman, G.

2026-03-13 neuroscience 10.64898/2026.03.11.711109 medRxiv

Top 0.1%

8.1%

Show abstract

Entrainment and predictive coding aid speech perception in both quiet and noisy environments. Isochronous, periodic auditory rhythmic cues facilitate entrainment and temporal expectations which can benefit encoding and perception of target speech. However, most studies using isochronous cues confound periodicity with predictability. To this end, we characterized how systematic changes in the acoustic dimensions of stimulus rate, target phase, periodicity, and predictably of an entraining sound precursor impact the subsequent identification of concurrent speech targets. Target concurrent vowel pairs were preceded by rhythmic woodblock cues which were either periodic-predictable (PP, isochronous rhythm), aperiodic-predictable (AP, accelerating rhythm), or aperiodic-unpredictable (AU, random rhythm). The number of pulses per rhythm was roved to further manipulate predictability. Stimuli also varied in presentation rate (2.5, 4.5, 6.5 Hz) and target speech phase (in-phase, 0{degrees}; out-of-phase, 90{degrees}, 180{degrees}) relative to the preceding entraining rhythm. We also measured participants musical pulse continuation and standardized speech-in-noise perception abilities. We did not observe any effects of stimulus rhythm, rate, or target phase on target speech identification accuracy. However, reaction times were slowest at the nominal speech rate (4.5 Hz) and were most disrupted by out-of-phase presentations following the PP rhythm. Double-vowel task performance was associated with stronger musical pulse continuation abilities, but not speech-in-noise perception. Our results support the notion that entraining rhythmic cues rely on top-down processing but are relatively muted when stimulus predictability is unknown. Additionally, we find that individual differences in musical pulse perception may underlie the benefits of rhythmic cueing on subsequent speech perception.

5

Discrimination of spectrally sparse complex-tone triads in cochlear implant listeners

Augsten, M.-L.; Lindenbeck, M. J.; Laback, B.

2026-03-24 neuroscience 10.64898/2026.03.20.712905 medRxiv

Top 0.1%

6.8%

Show abstract

Cochlear implant (CI) users typically experience difficulties perceiving musical harmony due to a restricted spectro-temporal resolution at the electrode-nerve interface, resulting in limited pitch perception. We investigated how stimulus parameters affect discrimination of complex-tone triads (three-voice chords), aiming to identify conditions that maximize perceptual sensitivity. Six post-lingually deafened CI listeners completed a same/different task with harmonic complex tones, while spectral complexity, voice(s) containing a pitch change, and temporal synchrony (simultaneous vs. sequential triad presentation) were manipulated. CI listeners discriminated harmonically relevant one-semitone pitch changes within triads when spectral complexity was reduced to three or five components per voice, with significantly better performance for three-component compared to nine-component tones. Sensitivity was observed for pitch changes in the high voice or in both high and low voices, but not for changes in only the low voice. Single-voice sensitivity predicted simultaneous-triad sensitivity when controlling for spectral complexity and voice with pitch change. Contrary to expectations, sequential triad presentation did not improve discrimination. An analysis of processor pulse patterns suggests that difference-frequency cues encoded in the temporal envelope rather than place-of-excitation cues underlie perceptual triad sensitivity. These findings support reducing spectral complexity to enhance chord discrimination for CI users based on temporal cues.

6

Can Multimodal Large Language Models Visually Interpret Auditory Brainstem Responses?

Jedrzejczak, W.; Kochanek, K.; Skarzynski, H.

2026-04-17 otolaryngology 10.64898/2026.04.15.26350944 medRxiv

Top 0.1%

5.1%

Show abstract

IntroductionAuditory brainstem response (ABR) is a standard objective method for estimating hearing threshold, especially in patients who cannot reliably participate in behavioral audiometry. However, ABR interpretation is usually performed by an expert. This study evaluated whether two general-purpose artificial intelligence (AI) multimodal large language model (LLM) chatbots, ChatGPT and Qwen, can accurately estimate ABR hearing thresholds from ABR waveform images. The accuracy was measured by comparisons with the judgements of 3 expert audiologists. MethodsA total of 500 images each containing several ABR waveforms recorded at different stimulus intensities were analyzed. Three expert audiologists established the reference auditory thresholds based on visual identification of wave V at the lowest stimulus intensity, with the most frequent judgment among the three used as the reference. Each waveform image was independently submitted to ChatGPT (version 5.1) and Qwen (version 3Max) using the same standardized prompt and without additional clinical context. Agreement with the expert thresholds was assessed as mean errors and correlations. Sensitivity and specificity for detecting hearing loss (>20 dB nHL) were also calculated. In cases where the AI and expert thresholds nominally matched, corresponding latency measures were also compared. ResultsAuditory thresholds derived from both LLMs correlated strongly with expert opinion, with Pearson r = 0.954 for ChatGPT and r = 0.958 for Qwen. ChatGPT showed a mean error of +5.5 dB and Qwen showed a mean error of -2.7 dB. Exact nominal agreement with expert values was achieved in 34.6% of ChatGPT estimates and 35.6% of Qwen estimates; agreement within {+/-}10 dB was observed in 75.6% and 80.0% of cases, respectively. For hearing-loss classification, ChatGPT achieved 100% sensitivity but low specificity (20.4%), whereas Qwen showed a more balanced profile with 91.6% sensitivity and 67.5% specificity. Curiously, estimates of wave V latency were markedly poor for both LLMs, with systematic underestimation and weak correlations with the expert judgements. ConclusionChatGPT and Qwen demonstrated a moderate ability to estimate ABR thresholds from waveform images, although their performance was not good enough for independent clinical use. Both models captured general patterns of hearing loss severity, but there was systematic bias, limited specificity and sensitivity balance, and poor latency estimation. General-purpose multimodal LLMs may have potential as assistive or preliminary tools, but clinically reliable ABR interpretation will likely require specialized, domain-trained AI systems with expert oversight.

7

Modeling the Influence of Bandwidth and Envelope on Categorical Loudness Scaling

Neely, S. T.; Harris, S. E.; Hajicek, J. J.; Petersen, E. A.; Shen, Y.

2026-04-01 neuroscience 10.64898/2026.03.30.715393 medRxiv

Top 0.1%

4.8%

Show abstract

In a loudness-matching paradigm, a reduction in the loudness of sounds with bandwidths less than one-half octave compared to a tone of equal sound pressure level has been observed previously for five-tone complexes at 60 dB SPL centered at 1 kHz. Here, this loudness-reduction phenomenon is explored using band-limited noise across wide ranges of frequency and level. Additionally, these measurements are simulated by a model of loudness judgement based on neural ensemble averaging (NEA), which serves as a proxy for central auditory signal processing. Multi-frequency equal-loudness contours (ELC) were measured for each of the adult participants (N=100) with pure-tone average (PTA) thresholds that ranged from normal to moderate hearing loss using a categorical-loudness-scaling (CLS) paradigm. Presentation level and center frequency of the test stimuli were determined on each trial according to a Bayesian adaptive algorithm, which enabled multi-frequency ELC estimation within about five minutes of testing. Three separate test conditions differed by stimulus type: (1) pure-tone, (2) quarter-octave noise and (3) octave noise. For comparison, loudness judgements for all three stimulus types were also simulated by the NEA model, which comprised a nonlinear, active, time-domain cochlear model with an appended stage of neural spike generation. Mid-bandwidth loudness reduction was observed to be greatest at moderate stimulus levels and frequencies near 1 kHz. This feature was approximated by the NEA model, which suggests involvement of an early stage of the central auditory system in the formation of loudness judgements.

8

Differentiating the Physiological Signatures of Cochlear Synaptopathy and Inner Hair Cell Damage in a Chinchilla Model

Sivaprakasam, A.; Schweinzger, I.; Heinz, M.

2026-05-08 neuroscience 10.64898/2026.05.05.723072 medRxiv

Top 0.1%

3.7%

Show abstract

Aging and noise over-exposure lead to complex mixtures of cochlear degradation that impair the structure and function of outer hair cells, inner hair cells (IHCs), and the cochlear nerve. However, IHC damage and cochlear synaptopathy (CS) remain pathologies "hidden" from the audiogram. This study aimed to identify and differentiate the physiological signatures of these two distinct pathologies using promising non-invasive assays: Envelope Following Responses (EFRs), Auditory Brainstem Response (ABRs), Wideband middle-ear reflexes (WB-MEMRs), and Distortion Product Otoacoustic Emissions (DPOAEs). We utilized chinchilla models of carboplatin-induced (CA) IHC damage (N = 4) and temporary threshold shift (TTS) noise-induced CS (N = 4) to compare the physiological signatures of each pathology. While both groups showed unchanged ABR thresholds two weeks after exposure, EFRs, ABR Wave V/I ratios, and MEMRs showed distinct effects of exposure. Despite non-elevated ABR-derived audiometric thresholds after exposure, both CA and TTS exposure resulted in severe in EFR "peakiness", particularly for sharp, short-duty-cycle stimuli and significant elevations in ABR Wave V/I ratios. However, these findings were less-pronounced in the TTS-exposed animals. WB-MEMR amplitudes were decreased with elevated thresholds in both groups; this effect was more pronounced in the TTS group. Opposite trends in DPOAE amplitudes indicated that while both IHC damage and CS result in similar suprathreshold temporal coding deficits, effects on outer-hair-cell integrity and auditory efferent physiology may differ between the two pathologies. Future work and novel diagnostics should aim to distinguish these specific cochlear pathologies in clinical populations, or at the very least consider their overlap. HighlightsO_LIA multi-metric diagnostic approach was used with chinchilla models of inner-hair-cell (IHC) damage and cochlear synaptopathy (CS). C_LIO_LIIHC damage and synaptopathy both cause suprathreshold deficits "hidden" from the audiogram. C_LIO_LIIHC damage results in more severe temporal envelope coding degradation than does synaptopathy. C_LIO_LIA combination of EFR "peakiness", ABR Wave V/I ratio, and Wideband Middle Ear Muscle Reflex (WB-MEMR) appear to be useful measures for profiling IHC damage and CS. C_LI

9

A standardized naturalistic audio stimuli database with unsupervised labeling

Al-Naji, A.; Schubotz, R. I.; Zahedi, A.

2026-04-21 neuroscience 10.64898/2026.04.16.718910 medRxiv

Top 0.1%

3.6%

Show abstract

Research in cognitive neuroscience has relied on simple, highly controlled stimuli due to the difficulty in developing standardized, ecologically valid stimulus sets. However, there is a consensus that using ecologically valid stimuli is imperative to generalize results beyond controlled laboratory settings. The current study introduces a naturalistic audio stimulus database, consisting of short, recognizable, and emotionally rated stimuli. To create such a database, the current study collected 291 audio files from a wide range of sources. 361 participants rated the audio clips on emotionality, arousal, and recognizability, and subsequently freely described the audios by typing what they believed the sound to be. The text responses of the participants were embedded and clustered using an unsupervised machine-learning algorithm to derive a participant-grounded organization of auditory object categories. The results indicate audio clips were easily recognizable, while emotionality and arousal ratings showed broad variability, making the database suitable for diverse experimental needs. Furthermore, the final database comprises 10 distinct semantic categories, providing a diverse set of auditory stimuli.

10

Improving Automated Diagnosis of Middle and Inner Ear Pathologies by Estimating Middle Ear Input Impedance from Wideband Tympanometry

Kamau, A. F.; Merchant, G. R.; Nakajima, H. H.; Neely, S. T.

2026-03-31 otolaryngology 10.64898/2026.03.26.26349034 medRxiv

Top 0.1%

3.6%

Show abstract

Conductive hearing loss (CHL) with a normal otoscopic exam can be difficult to diagnose because routine clinical measures such as audiometric air-bone gaps (ABGs) can identify a conductive component but often cannot distinguish among specific underlying mechanical pathologies (e.g., stapes fixation versus superior canal dehiscence, which may produce similar audiograms). Wideband tympanometry (WBT) is a fast, noninvasive test that can provide additional mechanical information across a broad range of frequencies (200 Hz to 8 kHz). However, WBT metrics are influenced by variations in ear canal geometry and probe placement and can be challenging to interpret clinically. In this study, we extend prior WBT absorbance-based classification work by estimating the middle ear input impedance at the tympanic membrane (ZME), a WBT-derived metric intended to reduce ear canal effects. To estimate ZME, we fit an analog circuit model of the ear canal, middle ear, and inner ear to raw WBT data collected at tympanometric peak pressure (TPP). Data from 27 normal ears, 32 ears with superior canal dehiscence, and 38 ears with stapes fixation were analyzed. A multinomial logistic regression classifier was trained using principal component analysis (retaining 90% variance) and stratified 5-fold cross-validation with regularization. We compared feature sets based on ABGs alone, ABGs combined with absorbance, and ABGs combined with the magnitude of ZME. The combination of ABGs and the magnitude of ZME produced the best performance, achieving an overall accuracy of 85.6% compared to 80.4% for ABGs alone and 78.4% for ABGs combined with absorbance. These results suggest that incorporating model-derived middle ear impedance features with standard audiometric measures (ABGs) can improve automated pathology classification for stapes fixation and superior canal dehiscence.

11

From sound to source: Human and model recognition of environmental sounds

Alavilli, S.; McDermott, J. H.

2026-03-14 neuroscience 10.64898/2026.03.12.711349 medRxiv

Top 0.1%

3.2%

Show abstract

Our ability to recognize sound sources in the world is critical to daily life, but is not well documented or understood in computational terms. We developed a large-scale behavioral benchmark of human environmental sound recognition, built stimulus-computable models of sound recognition, and used the benchmark to compare models to humans. The behavioral benchmark measured how sound recognition varied across source categories, audio distortions, and concurrent sound sources, all of which influenced recognition performance in humans. Artificial neural network models trained to recognize sounds in multi-source scenes reached near-human accuracy and qualitatively matched human patterns of performance in many conditions. By contrast, traditional models of the cochlea and auditory cortex that were trained to recognize sounds produced worse matches to human performance. Models trained on larger datasets exhibited stronger alignment with both human behavior and brain responses. The results suggest that many aspects of human sound recognition emerge in systems optimized for the problem of real-world recognition. The benchmark results set the stage for future explorations of auditory scene perception involving salience and attention.

12

Repolarisation Speed May Vary with Characteristic Frequency in Human Spiral Ganglion Cells: Preliminary Observation from Electrically Evoked Compound Action Potentials

Lien, J. T.-H.; Strahl, S.; Garcia, C.; Vickers, D.

2026-04-24 otolaryngology 10.64898/2026.04.23.26351590 medRxiv

Top 0.1%

3.1%

Show abstract

The human auditory system decomposes complex sounds into distinct components via a collection of processing steps. Knowing whether Spiral Ganglion Cells (SGCs) play an active role in the decoding of complex sounds can facilitate the development of Cochlear Implant (Cl) coding strategies and clinical assessment tools. Early animal studies reported SGCs being similar across different characteristic frequencies (CFs). In this study, human electrically evoked compound action potentials (eCAPs) were analysed to probe the relationship between the reciprocal of CF and the duration of the eCAP. A significant relationship could indicate that SGCs may not simply be passive cables. eCAP datasets from 6 published studies (175 Cl users, 1243 recordings) were analysed and their peaks were automatically labelled. The nlp2 latency was derived for each recording as a proxy of the action potential duration. The CF of each recording was estimated by mapping the average insertion angle of the electrode to the human SGC map. A weak but statistically significant relationship was observed between the n1p2 latency and the reciprocal of CF (random-effects model with random intercepts for subject, r = 0.09, p = 0.024, n= 450) supporting the hypothesis that lower CF is associated with slower repolarisation (longer n1p2 latency) in human spiral ganglion cells.

13

Neural Correlates of Listening States, Cognitive Load, and Selective Attention in an Ecological Multi-Talker Scenario

Shahsavari Baboukani, P.; Ordonez, R.; Gravesen, C.; Ostergaard, J.; Rank, M. L.; Alickovic, E.; Cabrera, A. F.

2026-03-15 neuroscience 10.64898/2026.03.13.711289 medRxiv

Top 0.1%

2.7%

Show abstract

This study assessed neural responses to continuous speech to classify listening state, cognitive load, and selective auditory attention in complex acoustic environments. EEG was recorded while participants listened to concurrent male and female talkers under two conditions: active listening, where attention was directed to one of two competing speakers (target vs. masker), or passive listening, where attention was diverted to a visual task. Cognitive load was varied by manipulating target-to-masker (TMR) ratio (TMR: +7 dB, -7 dB), with lower TMR representing more demanding listening conditions. Spectral EEG features across frequency bands were ranked with univariate statistics and used to classify listening state (active vs passive) and cognitive load (low vs. high TMR). Auditory attention decoding (AAD) was performed using linear stimulus reconstruction to identify the target talker during active listening. Classification of listening state achieved 90.3% accuracy, and AAD reached 84.4% accuracy, demonstrating robust tracking of attentional engagement. In contrast, classification of cognitive load was near chance, suggesting that more extreme acoustic manipulations may be required to elicit distinct neural signatures. Comparable performance using a reduced set of electrodes near the ear indicates the potential for integration with wearable hearing devices. Overall, these results demonstrate that EEG can distinguish attentional states and selectively track target speech in realistic auditory scenarios. The findings provide a foundation for future applications in monitoring listening behavior, supporting auditory processing, and improving brain-controlled hearing aids in complex acoustic environments. HighlightsO_LIListening state (active vs. passive) can be classified from EEG spectral features. C_LIO_LIAttended speech can be decoded by reconstructing speech envelopes from EEG. C_LIO_LIComparable accuracy is achieved using only electrodes placed around the ears. C_LIO_LIEEG can monitor listening state and track auditory attention in two-speaker settings. C_LI Graphical AbstractEEG signals were recorded while participants listened to two concurrent speech streams, either by actively attending to one speaker or by focusing on an unrelated visual task. Spectral features of the EEG were used to classify listening state (active vs. passive) and cognitive load (low vs. high TMR). Auditory attention decoding (AAD) was performed by reconstructing the speech envelope from the EEG time signal. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/711289v1_ufig1.gif" ALT="Figure 1"> View larger version (32K): org.highwire.dtl.DTLVardef@1079628org.highwire.dtl.DTLVardef@1135404org.highwire.dtl.DTLVardef@1f0d950org.highwire.dtl.DTLVardef@14b4c9a_HPS_FORMAT_FIGEXP M_FIG C_FIG Classification of listening state (active vs. passive): 90.3% accuracy. EEG difference between active and passive listening. Left, power spectrum, right, topographic map (alpha band 8-12 Hz). Classification of cognitive load (low vs high TMR): near chance level. EEG difference between low and high TMR. Left, power spectrum, right, topographic map (alpha band 8-12 Hz). O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/711289v1_ufig2.gif" ALT="Figure 2"> View larger version (34K): org.highwire.dtl.DTLVardef@9229b1org.highwire.dtl.DTLVardef@1ef394corg.highwire.dtl.DTLVardef@9adecforg.highwire.dtl.DTLVardef@199f8c2_HPS_FORMAT_FIGEXP M_FIG C_FIG AAD achieved 84.4% accuracy, indicating robust decoding of the attended speaker during active listening, while performance dropped to near chance during passive listening.

14

Speech-in-Noise Difficulties in Aminoglycoside Ototoxicity Reflects Combined Afferent and Efferent Dysfunction

Motlagh Zadeh, L.; Izhiman, D.; Blankenship, C. M.; Moore, D. R.; Martin, D. K.; Garinis, A.; Feeney, P.; Hunter, L. R.

2026-03-26 otolaryngology 10.64898/2026.03.23.26348719 medRxiv

Top 0.1%

2.7%

Show abstract

Objectives: Patients with Cystic fibrosis (CF) often receive aminoglycosides (AGs) to manage recurrent pulmonary infections, placing them at risk for ototoxicity. Chronic AG use can lead to complex cochlear damage affecting inner and outer hair cells, the stria vascularis, and spiral ganglion neurons. The greatest damage is typically in the basal cochlear region, which encodes high-frequency hearing, with additional involvement of more apical regions. While extended-high-frequency (EHF) hearing loss (EHFHL; 9-16 kHz) is often the earliest sign of AG ototoxicity, speech in noise (SiN) effects are rarely studied. Our overall hypothesis is that SiN perception difficulties in individuals with CF, treated with AGs, are related to combined cochlear and neural damage, primarily in the EHF range but also in the standard frequency (SF; 0.25-8 kHz) range. Three mechanisms that contribute to SiN perception were evaluated in children and young adults: 1) a primary effect of reduced EHF sensitivity, measured by pure-tone audiometry (PTA) and transient-evoked otoacoustic emissions (TEOAEs); 2) a secondary effect of subclinical damage in the SF range, measured by PTA and TEOAEs; and 3) additional neural effects, measured by middle ear muscle reflex (MEMR) threshold (afferent) and growth functions (efferent).Design:A total of 185 participants were enrolled; 101 individuals with CF treated with intravenous AGs and 84 age and sex-matched Controls without hearing concerns or CF. Assessments included EHF and SF PTA; the Bamford-Kowal-Bench (BKB)-SIN test for SiN perception; double-evoked TEOAEs with chirp stimuli from 0.71 to 14.7 kHz; and ipsilateral and contralateral wideband MEMR thresholds and growth functions using broadband stimuli. Results: Reduced sensitivity at EHFs (PTA, TEOAEs) was not associated with impaired SiN perception in the CF group. SF hearing, regardless of EHF status, was the primary predictor of SiN performance in the CF group. Increased MEMR growth was also significantly associated with poorer SiN in the CF group. Conclusions: In CF, impaired SiN perception was primarily predicted by SF hearing impairment, with additional involvement of the efferent auditory pathway through increased MEMR growth. These results build on prior evidence for efferent neural effects due to ototoxic exposures, supporting both sensory (afferent) and neural (efferent) mechanisms that contribute to listening difficulties in CF. Thus, preventive and intervention strategies should consider these combined mechanisms in people with AG ototoxicity to address their SiN problems.

15

Effects of Aging, Hearing Loss, and Co-Activation on the Middle Ear Muscle Reflex and Medial Olivocochlear Reflex

Devolder, P.; Deloche, F.; Thienpont, M.; Keppler, H.; Verhulst, S.

2026-04-28 otolaryngology 10.64898/2026.04.27.26351829 medRxiv

Top 0.1%

2.3%

Show abstract

The middle ear muscle reflex (MEMR) and medial olivocochlear reflex (MOCR) are increasingly studied for their role in suprathreshold auditory processing. However, recording these reflexes in humans is potentially complicated by age-related (sub)clinical hearing loss and co-activation. This study investigates (1) the influence of age-related (sub)clinical hearing loss, (2) methodological differences between conventional and wideband MEMR techniques, and (3) how MEMR activation contaminates MOCR recordings. Three test groups were included: young normal-hearing adults, middle-aged normal-hearing adults, and middle-aged adults with audiometric hearing loss. Cochlear status and neural encoding was assessed using distortion-product otoacoustic emissions (DPOAEs) and envelope following responses (EFRs). MEMR recordings were compared using conventional tonal stimuli and wideband stimuli. MOCR was recorded at elicitor levels of 60 and 75 dB to evaluate MEMR co-activation. MEMR was related to age, suggesting sensitivity to subclinical cochlear damage. Wideband stimuli were beneficial as elicitor (noise vs. tone), while changing the probe stimuli added no significant benefit (click vs. tone). MOCR strength did not correlate with age-related subclinical hearing, suggesting that MOCR measurements may reflect efferent function relatively independently of afferent sensorineural status in audiometric normal hearing subjects. However, reliable recordings were challenging in participants with audiometric hearing loss due to poor OAE baselines. MEMR co-activation was detectable in the click response and could alter MOCR-induced suppression. These findings suggest that, in cases of normal hearing thresholds, MEMR amplitude may be a marker of subclinical cochlear damage and MOCR measurements may more specifically reflect efferent function. Clinical measurements can be improved using broadband stimuli, accounting for outer-hair-cell damage, and defining criteria for reflex co-activation.

16

Beyond Onset Timing: Longer Sound Envelope Duration Enhances Neural Representation of the Musical Beat

Rosenzweig, F.; Lenoir, C.; Lenc, T.; Polak, R.; Huart, C.; Nozaradan, S.

2026-05-13 neuroscience 10.64898/2026.05.12.721298 medRxiv

Top 0.1%

1.9%

Show abstract

Musical rhythm is often experienced with a periodic beat, serving as a temporal reference for coordination with the rhythm. Thus far, models of beat processing have mainly relied on representing sensory inputs as patterns of onset timing, with limited consideration of other sensory features. Here, we challenge this view by showing that the internal representation of beat is affected by other temporal features of the stimulus beyond onset timing alone. We recorded electroencephalography (EEG) while participants listened to rhythmic sequences designed to elicit a beat. Across conditions, we manipulated the duration of the tones conveying the rhythms, while keeping all other parameters identical, including overall intensity, speed, and rhythmic pattern structure. Crucially, the beat periodicity was enhanced in neural activity with increased sound duration, even though the beat periodicity was not prominent in the acoustic features, thus ruling out basic sensory confounds. These results demonstrate the preferential role of longer sound durations in fostering temporal scaffolding processes that integrate fast rhythmic inputs into behavior-relevant internal structures such as the beat. More generally, our findings are compatible with a holistic processing account whereby a range of features beyond onset timing may be integrated into a neural representation of rhythm. Graphical Abstract: Fig. 2EEG was recorded while listeners heard rhythmic sequences eliciting a beat. Sound duration (sonic duty cycle) was varied across four conditions while speed, pattern, and intensity stayed constant. Beat-related EEG responses increased with longer sounds, and were enhanced in all conditions compared to auditory nerve model envelopes, which did not show prominent energy at the beat periodicity, ruling out sensory confounds. Results support holistic rhythm processing beyond onset timing alone. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=101 SRC="FIGDIR/small/721298v1_fig2.gif" ALT="Figure 2"> View larger version (27K): org.highwire.dtl.DTLVardef@10a0599org.highwire.dtl.DTLVardef@f5a95forg.highwire.dtl.DTLVardef@42d1ceorg.highwire.dtl.DTLVardef@dc58a7_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOFigure 2.C_FLOATNO EEG and auditory nerve model output analysis based on magnitude spectrum and autocorrelation. Each row represents a duty cycle condition. The two columns on the left represent the magnitude spectrum-based analysis. The first column represents the group-level averaged magnitude spectra at a pool of fronto-central electrodes, across conditions. Beat-related frequencies are shown in red, and beat-unrelated frequencies are shown in blue. Scalp topographies of the neural activity measured at the average magnitudes of beat-related (in red circle) and unrelated (in blue circle) frequencies are represented as insets. The second column represents the normalized magnitude spectra obtained from the auditory nerve model output for each duty cycle sequence. The two columns on the right represent the autocorrelation-based analysis (for visualization purposes, only a subset of lags from 0 to 2.4 s corresponding to the pattern duration is shown). The first column represents the group-level averaged autocorrelation function measured from the same pool of fronto-central electrodes, across conditions. Beat-related lags are shown in red, and beat-unrelated lags are shown in blue. The second column represents the autocorrelation function of the auditory nerve model output for each duty cycle sequence. C_FIG

17

Acoustic Salience Drives Pupillary Dynamics in an Interrupted, Reverberant Task

Figarola, V.; Liang, W.; Luthra, S.; Parker, E.; Winn, M.; Brown, C.; Shinn-Cunningham, B. G.

2026-04-02 neuroscience 10.64898/2026.03.31.715639 medRxiv

Top 0.1%

1.9%

Show abstract

Listeners face many challenges when trying to maintain attention to a target source in everyday settings; for instance, reverberation distorts acoustic cues and interruptions capture attention. However, little is known about how these challenges affect the ability to maintain selective attention. Here, we measured syllable recall accuracy and pupil dilation during a spatial selective attention task that was sometimes disrupted. Participants heard two competing, temporally interleaved syllable streams presented in pseudo-anechoic or reverberant environments. On randomly selected trials, a sudden interruption occurred mid-sequence. Compared to anechoic trials, reverberant performance was worse overall, and the interrupter disrupted performance. In uninterrupted trials, reverberation reduced peak pupil dilation both when it was consistent across all stimuli in a block and when it was randomized trial to trial, suggesting temporal smearing reduced clarity of the scene and the salience of events in the ongoing streams. Pupil dilations in response to interruptions indicated perceptual salience was strong across reverberant and anechoic conditions. Specifically, baseline pupil size before trials did not vary across room conditions, and mixing or blocking of trials (altering stimulus expectations) had no impact on pupillary responses. Together, these findings highlight that stimulus salience drives cognitive load more strongly than does task performance.

18

Effects of bimodal divided attention on cortical representations of linguistic context during continuous speech perception in noise

Xie, Z.

2026-05-01 neuroscience 10.64898/2026.04.28.721419 medRxiv

Top 0.1%

1.4%

Show abstract

Speech perception often takes place in environments with competing sensory inputs, both within the auditory modality and across modalities; for example, following a conversation in a noisy cafe while simultaneously reading a menu. This study examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) influences linguistic context processing across hierarchical levels during continuous speech perception in noise. Electroencephalographic (EEG) responses were recorded while participants listened to audiobook stories in multitalker babble as a secondary task, concurrently performing a demanding primary visual task that imposed either low or high cognitive load. Behaviorally, speech comprehension accuracy was significantly lower under high-load than low-load dual-task conditions. Multivariate temporal response function (mTRF) encoding models were used to predict EEG responses from information-theoretic measures (entropy and surprisal) indexing linguistic context at sublexical, word-form, and sentence levels. Significant neutral tracking was observed at the word-form and sentence levels, but not the sublexical level. Critically, neutral tracking of sentence-level linguistic representations was significantly reduced under high compared to low load, with effects emerging at latencies beyond 200 ms. In contrast, neutral tracking of word-form-level representations was unaffected by dual-task load. mTRF analyses further revealed that neutral tracking of acoustic features was not modulated by dual-task load. These findings indicate that bimodal divided attention selectively disrupts cortical representations of sentence-level linguistic context, while lower-level processing remains relatively preserved. Such impairments in higher-level linguistic processing may contribute to reduced speech comprehension during multitasking in noisy environments.

19

Multivariate Prediction of Conductive Dysfunction in Well and NICU Newborns using Wideband Acoustic Immittance with Acoustic Reflex Tests

Hunter, L. L.; Feeney, M. P.; Fitzpatrick, D.; Keefe, D. H.

2026-03-15 otolaryngology 10.64898/2026.03.13.26348314 medRxiv

Top 0.1%

1.4%

Show abstract

ObjectivesThe overall goal of this study was to assess tympanometric and ambient wideband acoustic immittance (WAI) tests and wideband acoustic reflex thresholds (ART) in well-baby and newborn intensive care (NICU) cohorts with three specific objectives: 1) Assess predictive accuracy for WBT and ART for conductive dysfunction in ears referring on the first or second stages of newborn hearing screening; 2) Identify inadequate tests likely due to probe blockages or leaks; and 3) Assess prediction models separately for well-baby and NICU screening outcomes. DesignProspective, observational study of full-term (n=514) and premature newborns (n=239) recruited from well-baby and NICU nursery birth hospital newborn hearing screening program. Wideband tympanometry, ambient absorbance, and acoustic reflexes were tested after Stage 1 transient otoacoustic emissions (TEOAE) screening. The reference standard for Pass or Refer groups was initially defined on the stage 1 TEOAE test result. Pass or Refer groups were then reassigned based on the stage 2 screening ABR for those who referred at Stage 1, and all NICU infants. Multivariate models were developed using reflectance and admittance variables to predict conductive dysfunction relative to the screening reference standard in a randomized sub-group of subjects at Stage 1 and Stage 2 screening. Classification accuracy was evaluated on a second, independent sub-group. Individual tests were classified as having inadequate probe fits if they had excessively low values of sound pressure level or susceptance (leak) or absorbance (blockage). ResultsDifferences in ambient absorbance for Pass v. Refer screening groups revealed the greatest differences and effect sizes occurring in frequency bins between 1.4-2 kHz. Screening failure at both Stage 1 and 2 was most accurately predicted by models using ambient absorbance and power level variables at frequencies between 1-2.8 kHz, including ARTs. Tympanometric admittance variables at the positive-pressure tail for frequencies between 1-2.8 kHz in combination with the ART were more accurate predictors than those at peak pressure or the negative-pressure tail. Multivariate models generalized well to an independent group of infants at both Stage 1 and 2 for both the ambient and tympanometric models. Ambient tests revealed more inadequate tests than tympanometric tests, primarily due to blocked probe tips. Exclusion of ears to detect probe leaks or blockages slightly improved the ambient prediction models, but did not affect tympanometric models. ConclusionWideband acoustic reflex tests improved all models for ambient and tympanometric absorbance. Multivariate prediction models developed for WAI tests were repeatable in an independent group of well and NICU infants, suggesting that the results are generalizable to these populations. Detection of probe blockage or leaks slightly improved prediction for ambient measures. Pressurized tests have the advantage of ensuring probe seals due to the need for a hermetic seal, thus are useful to ensure adequate probe insertion.

20

Iconic Sound-Shape Correspondences in Aphasia

Dorsi, J.; Sandberg, C.; Lacey, S.; Nygaard, L.; Sathian, K.

2026-05-19 neuroscience 10.64898/2026.05.18.725976 medRxiv

Top 0.1%

1.2%

Show abstract

PurposeTo examine speech iconicity for shape in aphasia, we compared iconicity ratings from people with aphasia to those from neurologically intact individuals and evaluated how iconicity relates to phonological and semantic processing profiles in aphasia. MethodEleven people with aphasia and 11 age- and gender-matched neurologically intact participants rated how rounded or pointed 50 auditory pseudowords sounded using a 5-point scale. Ratings from participants with aphasia were compared to predicted iconicity ratings derived from reference ratings from prior work and to ratings from neurologically intact participants. For each participant with aphasia, correlations between individual ratings and predicted ratings were related to measures of phonological and semantic processing. ResultsRatings from people with aphasia were significantly correlated with both the predicted ratings and the ratings from neurologically intact participants. The strength of the correlation between individual ratings and predicted ratings did not differ significantly between groups, although there was a trend toward weaker correlations in the aphasia group. There were indications that greater language impairment was associated with greater disruption of iconicity ratings; in particular, deficits in phonological segmentation and semantic processing were associated with reduced sensitivity to shape iconicity. ConclusionThese findings suggest that sensitivity to shape iconicity is preserved in individuals with aphasia to varying degrees. The specific nature of language impairment appears to play an important role in determining iconicity processing in aphasia.